Whether Skill Acquisition is Rule or Instance . . .
نویسنده
چکیده
The traditional view of skill acquisition is that it can be explained by a gradual transition from behavior based on declarative rules in the form of examples and instructions towards general knowledge represented by procedural rules. This view is challenged by Logan’s instance theory, which specifies that skill acquisition can be explained by the accumulation of examples or instances of the skill. The position defended in this paper is that both types of learning can occur, but their success depends on the respective task. In the Sugar Factory task, it is very hard to determine the rule guiding the system, so rule learning will fail and instance learning dominates. In the Fincham task, mainly rule learning occurs, but variations in the task show evidence for some instance learning as well. Experiments with both tasks are modeled using ACT-R, a hybrid cognitive architecture whose adaptive learning mechanisms seem to be well suited for modeling two very different tasks using the same methods. Skill Acquistion: Rule or Instance based? 3 Introduction The question whether skills are represented as abstract rule-like entities or as sets of concrete instances taps one of the central distinctions in cognitive science, spreading across fields as diverse as research on memory, problem solving, categorization or language learning (Logan, 1988; Hahn & Chater, 1998; Redington & Chater, 1996; Plunkett & Marchman, 1991; Lebiere, Wallach, & Taatgen, 1998; Taatgen & Anderson, submitted). Recently, Hahn and Chater (1998) proposed that the distinction between instanceand rule-based learning mechanisms cannot be based on different types of representations, but must involve a framework of their use in problem solving. In this paper we extend their argument and emphasize the necessity of an integrative investigation of human skill acquisition using a comprehensive theory of cognition. The view of skill acquisition as learning and following abstract rules has dominated theories of skill acquisition over the last decades, whether instantiated as production systems (Newell & Simon, 1972; Anderson, 1993), stored as logical implications (Rips, 1994) or represented in classifier systems (Holland, Holyoak, Nisbett & Thagard, 1986). While these approaches differ on various dimensions, they share the assumption that cognitive skills are realized as abstract rules that are applied to specific facts when solving problems. In ACT-R, a comprehensive architecture of human cognition (Anderson & Lebiere, 1998), it is assumed that people start out with specific examples or instances of previous problem solving episodes that can potentially be generalized to abstract rules. These rules can be applied in subsequent problem solving and can thus account for increased performance. Discontinuous improvements in cognitive performance as reported by Blessing and Anderson (1996) or Haider (1997) can be taken as evidence for the acquisition of new rules that allow for an increase in observed performance. While Anderson (1993) describes the view that Skill Acquistion: Rule or Instance based? 4 cognitive skills are realized as (production) rules as ”one of the most important discoveries” in cognitive psychology, Logan (1988) argues for domain-specific instances as the representational basis for cognitive skills. According to Logan’s instance theory, general-purpose procedures or methods are applied to solve novel problems. Each time such a procedure is used in problem solving, its result is encoded as a separate instance. For new problems, the solution can be calculated using general procedures, or the solution to a previous problem can be retrieved and applied to the current task. Retrieved episodes can be used as a whole, in part, or in an adapted versions to obtain solutions of new problems. One source of empirical support for the instance-based approach is the fact that repeating a certain specific example of a problem increases performance on this example, but not on others (Wallach, submitted). The fact that participants frequently cannot verbalize abstract knowledge about problems solved also seems to provide evidence against some form of generalization as implied by rule-based skill theories. The ACT-R theory that forms the basis of this article, however, assumes that rules themselves cannot consciously be inspected, so this phenomenon is at least not without an alternative explanation. Further evidence for the fact that knowledge is represented as rules, more specifically production rules, comes from research on the directional asymmetry of rules. A production rule consists of two parts, a condition part and an action part, which we informally denote as ‘IF condition THEN action’. In a production system, control always flows from the condition to the action, i.e. if the specified condition is met, then the action can potentially be executed. In many practical cases, the condition and the action are both part of a pattern. If we assume, for example, the pattern AB, a rule like ‘IF A THEN B’ can be used to complete the pattern given A. In an instance approach, the pattern AB can be stored as an instance, and Skill Acquistion: Rule or Instance based? 5 retrieved given either A or B. If participants are trained to complete some pattern AB on the basis of A, a rule approach predicts that they learn the rule ‘IF A THEN B’, and the instance approach predicts that they learn the instance AB. If participants are consequently asked to complete AB on the basis of B, the instance approach would not predict a significant decrease in performance. In the rule-based approach, with its assumption that control always flows from condition to action, however, the rule ‘IF A THEN B’ would be useless to complete AB on the basis of B. In that case a new rule ‘IF B THEN A’ would have to be learned, resulting in severe performance degradations (Kessler, 1988; Rabinowitz & Goldberg, 1995). Another apparent source of evidence stems from the fact that rules are more general than instances, which are assumed to be represented in a relatively unprocessed form (Redington & Chater, 1996). If participants show increased performance on problems they have not encountered before, some generalized knowledge can be postulated as the basis of the observed performance. If we assume that stored instances can only be applied when the new problem at hand is literally identical to encoded examples, no performance increase on new problems is to be expected. However, if one or more old examples (or fragments of them) can be used to improve performance on a new example in a less direct fashion (Lebiere, Wallach & Taatgen, 1998; Lebiere, 1999), generalization is also possible in an instance-based setting. Consequently, if generalization in transfer experiments is used as evidence against instance theory, it must be ruled out that the answer to a certain problem can easily be derived from stored answers to previous problems. As Redington and Chater (1996) have pointed out, surprisingly simple models, relying on represented fragments of observed stimuli, can perform exceedingly well in transfer tasks without acquiring any abstract knowledge. An example of such a model will be discussed in a later Skill Acquistion: Rule or Instance based? 6 section, when we demonstrate the scope of a purely instance-based approach in accounting for data that Broadbent (1989) has interpreted as evidence against the claim that general rules are learned on the basis of previously stored examples. The results of Broadbent and colleagues on dissociations between knowledge and performance seem to imply that participants can acquire rules to successfully operate complex systems without showing increased scores when answering questions about the system’s behavior. The instance-based ACT-R model proposed in this paper will provide a very simple alternative explanation for this dissociation result. Before presenting models of ruleand instance based skill acquisition in a unified theory of cognition, the next section provides the theoretical framework underlying our approach. Unified theories of Cognition In 1990 Newell published the book “Unified Theories of Cognition”, in which he elaborated the ambitious vision of a comprehensive theory of cognition, an approach he first outlined in 1973 (Newell, 1973). According to Newell, instead of developing micro-theories for every separate phenomenon, psychology should aim at unification. The ultimate goal, following Newell, is a integrative theory that encompasses a broad set of theoretical approaches: a truly Unified Theory of Cognition. Emphasizing the capability to make detailed predictions, Newell imagined a computational theory that is based on a cognitive architecture. A cognitive architecture, in analogy to the architecture of a computer, conceptualizes and implements the structures and mechanisms that are postulated to form the basis of human cognition. To be able to explain cognitive phenomena and to make predictions, such an architecture has to be supplied with a model of a specific task. A task model takes the form of a set of initial knowledge that is encoded in the representational structures provided by the Skill Acquistion: Rule or Instance based? 7 architecture. In the case of a model of expert behavior this may be an extensive body of task-specific knowledge. The task model of novice behavior, on the other hand, might only contain very general knowledge and some specific knowledge that is supposed to be acquired from instructions. Figure 1 illustrates the described research paradigm for cognitive modeling based on cognitive architectures. In his 1990 book Alan Newell acknowledged the fact that psychology is not ready yet for a single integrated theory. He presented Soar, a candidate architecture of his own, but also called upon his fellow researchers to design alternative architectures. Since the development of Soar and Newell’s challenge three other well-known architectures have been implemented on the basis of production systems: EPIC (Meyer & Kieras, 1997), CAPS (Just & Carpenter, 1992) and ACT-R (Anderson, 1993; Anderson & Lebiere, 1998). The models that we discuss in this article rely on the ACT-R architecture. The other two architectures, EPIC and CAPS, do not incorporate learning (yet), and are therefore not suitable for modeling knowledge acquisition processes. Although the Soar architecture does incorporate learning, its pure symbolic nature makes it hard to model the subtle effects of gradual learning and forgetting that characterize the experiments modeled in this paper (see also Taatgen, 1999). The next section provides a brief sketch of the basic theoretical concepts of the ACT-R architecture. The ACT-R theory The ACT-R theory rests upon two important components: rational analysis (Anderson, 1990) and the distinction between procedural and declarative memory (Anderson, 1976). According to rational analysis, each component of the human cognitive architecture is optimized with respect to demands from the environment, given its computational limitations. Rational analysis assumes that the functioning of Skill Acquistion: Rule or Instance based? 8 an architectural component can be derived by considering how the component in question can work as optimal as possible in a given environment. Anderson (1990) relates this optimality claim to evolution that is shaping the architecture. An example of this principle is the way choice is implemented in ACT-R. Whenever there is a choice between strategies to use or memory elements to retrieve, the architecture will take the one that has the highest expected gain, i.e. the choice that has the lowest expected cost while having the highest expected outcome. The principle of rational analysis can also be applied to task knowledge. While evolution shapes the architecture, learning shapes the knowledge and parts of the knowledge acquisition processes. Instead of only being focused on acquiring knowledge per se, learning should also aim at finding its right representation. This may imply that learning has to attempt several different ways to represent knowledge, so that the optimal one can be sorted out. This principle will underlie the models we will present later on. The distinction between procedural and declarative memory is studied quite extensively from different perspectives in psychology and the neurosciences (Anderson, 1976; Squire, 1992). While ACT-R‘s declarative memory is conceptualized as a store for factual knowledge, elements of procedural memory encode production rules. The ACT-R architecture does not postulate a separate working memory but instead enhances declarative memory by an activation concept (see below) to control access to facts. Only declarative memory elements, so-called chunks, above a certain threshold can be retrieved and deployed in problem solving. To keep track of the current context, ACT-R uses a goal stack which organizes the system’s intentions. The top element of the goal stack is called the focus of attention, a reference to an element in declarative memory that represents the current goal. Figure 2 presents an overview Skill Acquistion: Rule or Instance based? 9 of the processes and memory systems of ACT-R. ACT-R’s symbolic level ACT-R comprises two levels of description: a symbolic and a subsymbolic level. On the symbolic level representations in memory are discrete items, and processing applies procedural items to declarative items in the recognize-act-cycle typical for production systems (Waterman & Hayes-Roth, 1976). Declarative memory uses chunks to represent knowledge. A chunk stores information in a propositional fashion and may contain a certain fact, the current or previous goals as well as perceptual information. An example of a goal chunk, in which two is added to six and the answer has not yet been found is: GOAL23 ISA ADDITION ADDEND1 SIX ADDEND2 TWO ANSWER NIL In this example, ADDEND1, ADDEND2 and ANSWER are slots in chunk GOAL23, and SIX and TWO are fillers for these slots. SIX and TWO are references to other chunks in declarative memory, thus forming an interrelated structure of embedded chunks in declarative memory. The ANSWER slot has a value of NIL, meaning that the slot has no filler, i.e. the answer is not known yet. Let us assume that the chunk GOAL23 designates the current goal. If ACT-R manages to fill the ANSWER slot and focuses its attention on some other goal, GOAL23 will become part of declarative memory and takes the role of a fact representing that six plus two equals eight. GOAL23 can then be retrieved for later problem solving. Procedural information is represented in production memory by production rules. A production rule has two main components: a condition-part and an actionpart. The condition-part contains patterns that match the current goal and possibly Skill Acquistion: Rule or Instance based? 10 other elements in declarative memory. The action-part can modify slot-values in the goal and may create subgoals (or external actions, which will not be discussed here). In the recognize-act-cycle declarative elements are ”matched” to the patterns in the condition-part of a production rule and applied in its action-part. An example for a rule that tries to solve a subtraction problem by retrieving an addition chunk might look like: IF the goal is to subtract num2 from num1 and there is no answer AND there is a addition chunk that num2 plus num3 equals num1 THEN put num3 in the answer-slot of the goal This example shows an important aspect of production rules, namely variables. Variables allow for the applicability of a production in a class of situations and thus determine the abstract character of procedural knowledge. The symbols num1, num2 and num3 of the production shown above denote variables that can be instantiated by any value (with the restriction that same variables have to be bound to the same value). The above rule above can thus find the answer to any subtraction problem, given that the necessary addition chunk is available in declarative memory. ACT-R’s subsymbolic level The symbolic level provides for the basic building blocks of ACT-R. Using only this level already allows for several interesting models for tasks in which a clearly defined set of rules can be applied — the ”Tower of Hanoi” (Anzai & Simon, 1978) being a famous example of such a task (Anderson, Kushmerick & Lebiere, 1993). The symbolic level, however, leaves a number of details unspecified. The main topic that it delegates to the subsymbolic level is choice. Choices must be made when there is more than one production rule that can match in a situation, or when there is more than one chunk that matches the condition pattern in a production rule. Other matters that are taken care of by the subsymbolic level are accounts for errors and forgetting, as well as the prediction of latencies. Skill Acquistion: Rule or Instance based? 11 At the subsymbolic level each rule or chunk is annotated with a number of numerical quantities. In the case of chunks, these parameters are used to calculate an estimate of the likelihood that the chunk is needed given the current context. This estimate, called the activation of a chunk, has two components: a base-level activation, that represents the relevance of the chunk by itself, and a context activation through association strengths with fillers of the current goal chunk. Figure 3 shows an example in the case of the subtraction problem 8-3=?. The fact that eight and three are part of the context increases the probability that chunks associated with eight and three are needed. In this case 3+5=8 will get extra activation through both three and eight. The activation process is computed by the following equation: In this equation, Ai is the total activation of chunk i. This total activation has three parts, base-level activation (Bi) and context activation ( ), and noise. As context activation does not play any important role in the models discussed here, we will leave out the details (see Anderson & Lebiere, 1998, p. 71). The activation level of a chunk has a number of consequences for its processing. If the activation is below a certain threshold, it cannot be retrieved by patterns in the condition part of a production. If there is more than one chunk that matches the pattern in a production rule, the chunk with the highest activation is chosen. As activation has a noise component, these processes are stochastic: the chunk with the highest activation is not guaranteed to be selected, but has the highest probability of being retrieved. Differences in activation levels can lead to mismatches, in which a chunk with a high activation that does not completely match the production rule is selected, while Ai Bi W jS ji noise + j ∑ + =
منابع مشابه
Whether Skill Acquisition is Rule or Instance Based is determined by the Structure of the Task
The traditional view of skill acquisition is that it can be explained by a gradual transition from behavior based on declarative rules in the form of examples or instructions towards general knowledge represented by procedural rules. This view is challenged by Logan’s instance theory, which specifies that skill acquisition can be explained by the accumulation of examples or instances of the ski...
متن کاملConfidence-based progress-driven self-generated goals for skill acquisition in developmental robots
A reinforcement learning agent that autonomously explores its environment can utilize a curiosity drive to enable continual learning of skills, in the absence of any external rewards. We formulate curiosity-driven exploration, and eventual skill acquisition, as a selective sampling problem. Each environment setting provides the agent with a stream of instances. An instance is a sensory observat...
متن کاملCognitive skill acquisition.
Cognitive skill acquisition is acquiring the ability to solve problems in intellectual tasks, where success is determined more by subjects' knowledge than by their physical prowess. This review considers research conducted in the past ten years on cognitive skill acquisition. It covers the initial stages of acquiring a single principle or rule, the initial stages of acquiring a collection of in...
متن کاملThe role of examples and rules in the acquisition of a cognitive skill.
In 3 experiments, participants memorized 8 examples, each exemplifying a different rule. Participants were asked to extend these rules to new examples. They practiced applications of the rules to examples over a period of 4 days (Experiment 1) or 5 days (Experiments 2 and 3). Although these rules were bidirectional, an asymmetry gradually built up such that participants became more facile in us...
متن کاملDifferential hippocampal and prefrontal-striatal contributions to instance-based and rule-based learning.
It is a topic of current interest whether learning in humans relies on the acquisition of abstract rule knowledge (rule-based learning) or whether it depends on superficial item-specific information (instance-based learning). Here, we identified brain regions that mediate either of the two learning mechanisms by combining fMRI with an experimental protocol shown to be able to dissociate both le...
متن کامل